32 research outputs found

    Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

    Get PDF
    Over the last two decades, the number of organizations -both in the public and private sector- which have automated decisional processes has grown notably. The phenomenon has been enabled by the availability of massive amounts of personal data and the development of software systems that use those data to optimize decisions with respect to certain optimization goals. Today, software systems are involved in a wide realm of decisions that are relevant for the lives of people and the exercise of their rights and freedoms. Illustrative examples are systems that score individuals for their possibility to pay back a debt, recommenders of the best candidates for a job or a house rent advertisement, or tools for automatic moderation of online debates. While advantages for using algorithmic decision making concern mainly scalability and economic affordability, on the other hand, several critical aspects have emerged, including systematic adverse impact for individuals belonging to minorities and disadvantaged groups. In this context, the terms data and algorithm bias have become familiar to researchers, industry leaders and policy makers, and much ink has been spelled on the concept of algorithm fairness, in order to produce more equitable results and to avoid discrimination. Our approach is different from the main corpus of research on algorithm fairness because we shift the focus from the outcomes of automated decision making systems to its inputs and processes. Instead, we lay the foundations of a risk assessment approach based on a measurable characteristic of input data, i.e. imbalance, which can lead to discriminating automated decisions. We then relate the imbalance to existing standards and risk assessment procedures. We believe that the proposed approach can be useful to a variety of stakeholders, e.g. producers and adopters of automated decision making software, policy makers, certification or audit authorities. This would allow for the assessment of the risk level of discriminations when using imbalanced data in decision making software. This assessment should prompt all the involved stakeholders to take appropriate actions to prevent adverse effects. Such discriminations, in fact, pose a significant obstacle to human rights and freedoms, as our societies increasingly rely on automated decision making. This work is intended to help mitigate this problem, and to contribute to the development of software systems that are socially sustainable and are in line with the shared values of our democratic societies

    Effect of Heat Current on Magnetization Dynamics in Magnetic Insulators and Nanostructures

    Get PDF
    The term "spin caloritronics" defines a novel branch of spintronics that focuses on the interplay between electron spins with heat currents. In the frame of this research area, this thesis is aimed at investigating the effect of a heat current on magnetization dynamics in two different typologies of systems and materials: magnetic insulators and metallic nanostructures. In the first case we conduct studies on yttrium iron garnet (YIG) samples subjected to a temperature gradient. The irreversible thermodynamics of a continuous medium with magnetic dipoles predicts that a thermal gradient across a YIG slab, in the presence of magnetization waves, produces a magnetic field that is the magnetic analog of the well known Seebeck effect. This thermally induced field can influence the time evolution of the magnetization, in such a way that it is possible to modulate the relaxation of the precession when applying a heat current. We found evidence for such a magnetic Seebeck effect (MSE) by conducting transmission measurements in a thin slab of YIG subjected to an in-plane temperature gradient. We showed how the MSE can modulate the magnetic damping depending on the direction of the propagating magnetostatic modes with respect to the orientation of the temperature gradient. In the second part of the thesis we focus our investigation on metallic nanostructures subjected to a heat current. In a metal, the three-current model (current of entropy, of spin up and spin down electrons) predicts that a heat current induces a spin current which will then influence the magnetization dynamics like a charge-driven spin current would. Hence, we explore what has been called Thermal Spin Torque in electrodeposited Co / Cu / Co asymmetric spin valves placed in the middle of copper nanowires. These samples are fabricated by conventional electrodeposition technique in porous polycarbonate membranes using an original method that allows high frequency electrical measurements. We used a modulated laser to investigate the effect of a temperature gradient. We observed a heat-driven spin torque by measuring electrically the quasi-static magnetic response of a spin valve when subjected to the heat current, generated by two laser diodes heating the electrical contact at one end or the other of the nanowire. Analysing the variation in the resistance induced by a heat-driven spin torque, represented by peaks occurring in correspondence with the GMR transition, we found that a temperature difference of the order of 5 K is sufficient to produce sizeable torque in spin valves

    Organizing the Technical Debt Landscape

    Get PDF
    To date, several methods and tools for detecting source code and design anomalies have been developed. While each method focuses on identifying certain classes of source code anomalies that potentially relate to technical debt (TD), the overlaps and gaps among these classes and TD have not been rigorously demonstrated. We propose to construct a seminal technical debt landscape as a way to visualize and organize research on the subjec

    SeMi: A SEmantic Modeling machIne to build Knowledge Graphs with graph neural networks

    Get PDF
    SeMi (SEmantic Modeling machIne) is a tool to semi-automatically build large-scale Knowledge Graphs from structured sources such as CSV, JSON, and XML files. To achieve such a goal, SeMi builds the semantic models of the data sources, in terms of concepts and relations within a domain ontology. Most of the research contributions on automatic semantic modeling is focused on the detection of semantic types of source attributes. However, the inference of the correct semantic relations between these attributes is critical to reconstruct the precise meaning of the data. SeMi covers the entire process of semantic modeling: (i) it provides a semi-automatic step to detect semantic types; (ii) it exploits a novel approach to inference semantic relations, based on a graph neural network trained on background linked data. At the best of our knowledge, this is the first technique that exploits a graph neural network to support the semantic modeling process. Furthermore, the pipeline implemented in SeMi is modular and each component can be replaced to tailor the process to very specific domains or requirements. This contribution can be considered as a step ahead towards automatic and scalable approaches for building Knowledge Graphs

    Integrating SQuARE data quality model with ISO 31000 risk management to measure and mitigate software bias

    Get PDF
    In the last decades the exponential growth of available information, together with the availability of systems able to learn the knowledge that is present in the data, has pushed towards the complete automation of many decision- making processes in public and private organizations. This circumstance is posing impelling ethical and legal issues since a large number of studies and journalistic investigations showed that software-based decisions, when based on historical data, perpetuate the same prejudices and bias existing in society, resulting in a systematic and inescapable negative impact for individuals from minorities and disadvantaged groups. The problem is so relevant that the terms data bias and algorithm ethics have become familiar not only to researchers, but also to industry leaders and policy makers. In this context, we believe that the ISO SQuaRE standard, if appropriately integrated with risk management concepts and procedures from ISO 31000, can play an important role in democratizing the innovation of software-generated decisions, by making the development of this type of software systems more socially sustainable and in line with the shared values of our societies. More in details, we identified two additional measure for a quality characteristic already present in the standard (completeness) and another that extends it (balance) with the aim of highlighting information gaps or presence of bias in the training data. Those measures serve as risk level indicators to be checked with common fairness measures that indicate the level of polarization of the software classifications/predictions. The adoption of additional features with respect to the standard broadens its scope of application, while maintaining consistency and conformity. The proposed methodology aims to find correlations between quality deficiencies and algorithm decisions, thus allowing to verify and mitigate their impact

    Modeling the semantics of data sources with graph neural networks

    Get PDF
    Semantic models are fundamental to publish datainto Knowledge Graphs (KGs), since they encodethe precise meaning of data sources, through con-cepts and properties defined within reference on-tologies. However, building semantic models re-quires significant manual effort and expertise. Inthis paper, we present a novel approach based onGraph Neural Networks (GNNs) to build seman-tic models of data sources. GNNs are trained onLinked Data (LD) graphs, which serve as back-ground knowledge to automatically infer the se-mantic relations connecting the attributes of a datasource. At the best of our knowledge, this is thefirst approach that employs GNNs to identify thesemantic relations. We tested our approach on 15target sources from the advertising domain (usedin other studies in the literature), and comparedits performance against two baselines and a tech-nique largely used in the state of the art. Theevaluation showed that our approach outperformsthe state of the art in cases of data source withthe largest amount of semantic relations definedin the ground truth

    Metrics for Identifying Bias in Datasets

    Get PDF
    Nowadays automated decision-making systems are pervasively used and more often, they are used for taking important decisions in sensitive areas such as the granting of a bank overdraft, the susceptibility of an individual to a virus infection, or even the likelihood of repeating a crime. The widespread use of these systems raises a growing ethical concern about the risk of a potential discriminatory impact. In particular, machine-learning systems trained on unbalanced data could rise to systematic discriminations in the real world. One of the most important challenges is to determine metrics capable of detecting when an unbalanced training dataset may lead to discriminatory behaviour of the model built on it. In this paper, we propose an approach based on the notion of data completeness using two different metrics: one based on the combinations of the values of the dataset, which will be our benchmark, and the second using frame theory, widely used among others for quality measures of control systems. It is important to remark that the use of metrics cannot be a substitute for a broader design that must take into account the columns that could lead to the presence of bias in the data. The line of research does not end with these activities but aims to continue the path towards a standardised register of measures

    Detecting discriminatory risk through data annotation based on Bayesian inferences

    Get PDF
    Thanks to the increasing growth of computational power and data availability, the research in machine learning has advanced with tremendous rapidity. Nowadays, the majority of automatic decision making systems are based on data. However, it is well known that machine learning systems can present problematic results if they are built on partial or incomplete data. In fact, in recent years several studies have found a convergence of issues related to the ethics and transparency of these systems in the process of data collection and how they are recorded. Although the process of rigorous data collection and analysis is fundamental in the model design, this step is still largely overlooked by the machine learning community. For this reason, we propose a method of data annotation based on Bayesian statistical inference that aims to warn about the risk of discriminatory results of a given data set. In particular, our method aims to deepen knowledge and promote awareness about the sampling practices employed to create the training set, highlighting that the probability of success or failure conditioned to a minority membership is given by the structure of the data available. We empirically test our system on three datasets commonly accessed by the machine learning community and we investigate the risk of racial discrimination.Comment: 11 pages, 8 figure
    corecore